Add support for the parameter-shift hessian to the Autograd interface #1131

josh146 · 2021-03-09T15:37:30Z

Context:

The Autograd custom gradient interface currently returns a non-differentiable gradient.

Description of the Change:

Modifies the Autograd interface so that the Jacobian also defines a custom gradient --- the vector-Hessian product --- where the Hessian is computed by an efficient implementation of the parameter-shift rule.

Autograd QNodes can now be doubly differentiated, on both simulators and hardware:

import pennylane as qml
from pennylane import numpy as np

dev = qml.device('default.qubit', wires=1)

@qml.qnode(dev, diff_method="parameter-shift")
def circuit(p):
    qml.RY(p[0], wires=0)
    qml.RX(p[1], wires=0)
    return qml.probs(0)

x = np.array([1.0, 2.0], requires_grad=True)

>>> circuit(x)
tensor([0.38757745, 0.61242255], requires_grad=True)
>>> qml.jacobian(circuit)(x)
array([[ 0.17508774, -0.24564775],
       [-0.17508774,  0.24564775]])
>>> qml.jacobian(qml.jacobian(circuit))(x)
array([[[ 0.11242255,  0.3825737 ],
        [ 0.3825737 ,  0.11242255]],
       [[-0.11242255, -0.3825737 ],
        [-0.3825737 , -0.11242255]]])

In addition, a slight tweak was made to the Jacobian/Hessian evaluation logic in order to avoid redundant Jacobian/Hessian computations.

Benefits:

Autograd QNodes now support double derivatives on hardware and software
The Jacobian/Hessian is now only computed once for given input parameters, and re-used for multiple VJPs/VHPs.

Possible Drawbacks:

3rd derivatives and higher are not supported. We could potentially support higher derivatives using recursion.
Currently, Jacobian parameter-shifts are not being re-used for the Hessian parameter-shifts. This requires more thinking.
I realised while writing this PR that the parameter-shift Hessian logic is based on the formula given here: https://arxiv.org/abs/2008.06517. However, this formula assumes all gates support the 2-term parameter-shift; this is not the case of the controlled rotation. Therefore, computing the Hessian of the controlled rotations will give the incorrect result!
Long term, we should consider implementing parameter-shift logic as follows:
- Low level, we provide functions for computing vjp and vhp directly. This avoids redundant parameter-shift evaluations.
- Parameter-shifts for the same value are cached and re-used
- Recursion for arbitrary order derivatives.

Related GitHub Issues: n/a

… hessian_torch

codecov · 2021-03-09T17:20:35Z

Codecov Report

Merging #1131 (8745c6f) into master (ede35f4) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1131   +/-   ##
=======================================
  Coverage   98.12%   98.12%           
=======================================
  Files         144      144           
  Lines       10813    10833   +20     
=======================================
+ Hits        10610    10630   +20     
  Misses        203      203

Impacted Files	Coverage Δ
pennylane/_grad.py	`100.00% <100.00%> (ø)`
pennylane/interfaces/autograd.py	`100.00% <100.00%> (ø)`
pennylane/tape/jacobian_tape.py	`97.84% <100.00%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ede35f4...8745c6f. Read the comment docs.

chaserileyroberts

The matrix caching scares me.

chaserileyroberts · 2021-03-09T21:45:08Z

pennylane/tape/interfaces/autograd.py

-        def gradient_product(g):
-            # In autograd, the forward pass is always performed prior to the backwards
-            # pass, so we do not need to re-unwrap the parameters.
+        saved_grad_matrices = {}


Not a big fan of a global storing solution. This is going to kill memory usage if they users runs a ton of experiments.

Good point. The idea here is to simply cache the hessian/jacobian for the same parameter value, if vjp is called repeatedly by Autograd. So repeated calls by the user via qml.jacobian won't do any caching, but a single call by the user to qml.jacobian will cache under the hood for autograd.

This is basically a workaround Python not having first class dynamic programming (which would be very useful here).

Regarding memory usage: I think this should be safe, since this isn't global storage, instead it is locally scoped to within the vjp function - once the call to vjp() is complete, CPython should delete the saved_grad_matrices data.

chaserileyroberts · 2021-03-09T21:47:24Z

pennylane/tape/interfaces/autograd.py

+
+            @autograd.extend.primitive
+            def jacobian(p):
+                jacobian = _evaluate_grad_matrix(p, "jacobian")


Why are you storing the results in a dictionary with string keys? Especially if you only have two keys.

Seems... dangerous lol.

I'm not sure I entirely follow?

In order to modify a variable used in closure, it needs to be a mutable type, so either dictionary, set, or list. If that's what you are asking about.

I was confused about that and had to do some background reading about closure to figure it out. Maybe there should be a comment about that?

chaserileyroberts · 2021-03-09T21:48:22Z

pennylane/tape/interfaces/autograd.py

            self.set_parameters(self._all_parameter_values, trainable_only=False)

-            # only flatten g if all parameters are single values
+            saved_grad_matrices[grad_matrix_fn] = grad_matrix


Is saved_grad_matrices ever cleared? How does it behave between runs? Could there be unintended state poising between calls to this method?

Let me write some tests for this to make sure it works as I expect.

Is saved_grad_matrices ever cleared?

I believe it is cleared by nature of CPython deleting local variables once they go out of scope. That is once the call to vjp() is complete, saved_grad_matrices will be deleted.

I'm not 100% sure though, since it is used in locally defined functions that are returned...

Is there a way of testing/verifying this?

Yep, you are correct that is what will happen. That dictionary is in the namesapce of the vjp function and not the class which I originally thought. This should be fine the.

pennylane/tape/interfaces/autograd.py

josh146 · 2021-03-10T11:39:24Z

Benchmarking courtesy of @mariaschuld:

      before           after         ratio
     [f87e5c5f]       [fc39d412]
     <master>         <hessian_autograd>
       14.3±0.7ms       14.0±0.4ms     0.98  asv.core_suite.CircuitEvaluation.time_circuit(10, 3)
         26.7±1ms       25.9±0.5ms     0.97  asv.core_suite.CircuitEvaluation.time_circuit(10, 6)
         41.4±2ms       37.9±0.7ms     0.92  asv.core_suite.CircuitEvaluation.time_circuit(10, 9)
       4.82±0.7ms      4.34±0.02ms    ~0.90  asv.core_suite.CircuitEvaluation.time_circuit(2, 3)
         7.45±2ms       6.41±0.3ms    ~0.86  asv.core_suite.CircuitEvaluation.time_circuit(2, 6)
         9.69±1ms      8.90±0.08ms     0.92  asv.core_suite.CircuitEvaluation.time_circuit(2, 9)
       8.49±0.7ms       7.80±0.1ms     0.92  asv.core_suite.CircuitEvaluation.time_circuit(5, 3)
      13.4±0.04ms       13.3±0.1ms     0.99  asv.core_suite.CircuitEvaluation.time_circuit(5, 6)
       20.1±0.1ms       19.2±0.4ms     0.96  asv.core_suite.CircuitEvaluation.time_circuit(5, 9)
       6.92±0.3ms         6.25±1ms    ~0.90  asv.core_suite.GradientComputation.time_gradient_autograd(2, 3)
       11.0±0.9ms       9.23±0.2ms    ~0.84  asv.core_suite.GradientComputation.time_gradient_autograd(2, 6)
       13.4±0.8ms       12.2±0.5ms    ~0.91  asv.core_suite.GradientComputation.time_gradient_autograd(5, 3)
         22.4±1ms       22.6±0.8ms     1.01  asv.core_suite.GradientComputation.time_gradient_autograd(5, 6)
          177±4ms          170±2ms     0.96  asv.core_suite.Optimization.time_optimization_autograd

and memory usage:

       before           after         ratio
     [f87e5c5f]       [fc39d412]
     <master>         <hessian_autograd>
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(10, 3)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(10, 6)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(10, 9)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(2, 3)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(2, 6)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(2, 9)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(5, 3)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(5, 6)
             196M             196M     1.00  asv.core_suite.CircuitEvaluation.peakmem_circuit(5, 9)
             196M             196M     1.00  asv.core_suite.GradientComputation.peakmem_gradient_autograd(2, 3)
             196M             196M     1.00  asv.core_suite.GradientComputation.peakmem_gradient_autograd(2, 6)
             197M             197M     1.00  asv.core_suite.GradientComputation.peakmem_gradient_autograd(5, 3)
             197M             197M     1.00  asv.core_suite.GradientComputation.peakmem_gradient_autograd(5, 6)
             197M             198M     1.00  asv.core_suite.Optimization.peakmem_optimization_autograd

thisac

Thanks @josh146! Seems like a nice addition. Although, I'm a bit confused about exactly how autograd works here, so it would be nice to have a brief chat about it sometime.

.github/CHANGELOG.md

pennylane/_grad.py

pennylane/tape/interfaces/autograd.py

thisac · 2021-03-10T18:04:39Z

pennylane/tape/interfaces/autograd.py

-        def gradient_product(g):
-            # In autograd, the forward pass is always performed prior to the backwards
-            # pass, so we do not need to re-unwrap the parameters.
+        saved_grad_matrices = {}


Just a thought, and not necessarily better than the current solution; but could there be a case here for instead saving them as saved_hessian and saved_jacobian instead of storing them in a single dict? 🤔

I'm not sure I see the advantage over a dictionary - in Python, a lot of caching is done with dictionaries, including class properties, memoization, etc.

thisac · 2021-03-10T18:24:47Z

tests/tape/interfaces/test_qnode_autograd.py

+    def test_grad_matrix_caching(self, mocker, dev_name, diff_method):
+        """Test the gradient matrix caching"""


How is this testing the caching? It seems like it's only testing that it doesn't cache when calling qml.jacobian or qml.hessian several times (which is good, although not what I would expect from the naming and docstring for this test).

Good point - this is still WIP, I added it after @Thenerdstation's comments. I'm still not fully sure how to test this, it might require usage of https://docs.python.org/3/library/gc.html?

I've added a bit more of an exploration here: #1131 (comment)

thisac · 2021-03-10T19:12:30Z

pennylane/tape/interfaces/autograd.py

+            def vhp(ans, p):
+                def hessian_product(ddy):


I think it'd be good to get some more details on what's happening here. I'm not fully sure what's going on tbh (mainly autograd-wise).

Co-authored-by: Theodor <theodor@xanadu.ai>

josh146 · 2021-03-17T08:38:56Z

@Thenerdstation I've explored the issue of the cache further, just putting my exploration here for now.

The autograd.grad function, essentially, looks like this:

def grad(fun, x):
    vjp, ans = _make_vjp(fun, x)
    grad_value = vjp(vspace(ans).ones())
    return grad_value, ans

where vjp is a function that returns the gradient, and ans the output of the forward pass.

vjp is actually created via a huge chain of nested functions, with the 'leaves' of this structure being the individual vjp functions for each node in the computational graph (including AutogradInterface.vjp).

By using Python's __closure__ special method, we can actually dig deep into this structure, and extract the saved matrix cache:

def grad(fun, x):
    vjp, ans = _make_vjp(fun, x)

    ################################################################
    # The returned vjp function contains the saved matrix cache via closure.
    # Let's extract this.
    saved_matrices = []

    import autograd

    # extract the end node from the VJP closure
    end_node = vjp.__closure__[0].cell_contents
    outgrads = {end_node: (vspace(ans).ones(), False)}

    # iterate through all nodes in the computational graph
    for node in autograd.tracer.toposort(end_node):
        outgrad = outgrads.pop(node)
        ingrads = node.vjp(outgrad[0])

        try:
            # look deep inside the nested function closure, and attempt to
            # extract the saved matrices dictionary cache if the node is a QNode
            qnode_cache = (
                node.vjp
                .__closure__[0].cell_contents
                .__closure__[0].cell_contents
                .__closure__[1].cell_contents
            )
            if isinstance(qnode_cache, dict) and "jacobian" in qnode_cache:
                saved_matrices.append(qnode_cache)
        except:
            # the node was not a QNode; pass.
            pass

        for parent, ingrad in zip(node.parents, ingrads):
            outgrads[parent] = autograd.core.add_outgrads(outgrads.get(parent), ingrad)

    print(saved_matrices)
    ################################################################

    grad_value = vjp(vspace(ans).ones())
    return grad_value, ans # after returning, vjp is now out of scope

So, inside the qml.grad() function, the saved matrix cache (which contains at most two keys, Jacobian and Hessian), continues to exist in memory.

As far as I can tell, however, as soon as the qml.grad() function exits, the vjp function variable goes out of scope, and the Python garbage collector will delete all references to vjp, as well as the nested closures it contains. So the memory used by the cache matrices should be freed.

pennylane/interfaces/autograd.py

…nto hessian_autograd

pennylane/interfaces/autograd.py

albi3ro · 2021-03-23T14:28:15Z

pennylane/interfaces/autograd.py

+                    if dy.size > 1:
+                        if all(np.ndim(p) == 0 for p in params):
+                            # only flatten dy if all parameters are single values
+                            vhp = dy.flatten() @ ddy @ hessian @ dy.flatten()


I don't understand why we have dy twice here.

I'm not sure I can fully answer that, but it makes sense if you consider the shape of the hessian and perform dimensional analysis:

the hessian will be of shape (num_params, num_params, num_output),

dy (think of it like the gradient) has shape (num_output,), and

ddy (think of it like the gradient of the gradient) has shape (num_output, num_params).

So performing the matrix multiplication will return a vhp of the right size, (num_params,).

E.g.:

>>> num_params = 2 >>> num_output = 3 >>> dy = np.random.random([num_output]) >>> ddy = np.random.random([num_output, num_params]) >>> hessian = np.random.random([num_params, num_params, num_output]) >>> dy @ ddy @ hessian @ dy array([1.82038363, 2.03458631])

pennylane/interfaces/autograd.py

albi3ro · 2021-03-23T14:54:52Z

tests/interfaces/test_qnode_autograd.py

+
+        dev = qml.device(dev_name, wires=1)
+
+        @qnode(dev, diff_method=diff_method, interface="autograd")


I wonder if we should create an "analytic circuit" set of fixtures for the testing suite and be able to easily reuse the quantum function, result, derivative, and hessian across the different interfaces.

This is a good point 🤔 many more tests like this exist across the interface test suite, so maybe something worth considering in a new PR/issue?

albi3ro

Summary of things to change:

comment clarifying closure and saved_grad_matrices
redefining p in a comprehension

Optional suggestion:

analytic circuits test fixture file

There are still things I don't completely understand about this, but it works and that is the most important thing :)

josh146 · 2021-03-23T16:11:36Z

Thanks @albi3ro! Your comment about the parameter flattening if statement really helped make the PR better.

pennylane/interfaces/autograd.py

Co-authored-by: Theodor <theodor@xanadu.ai>

albi3ro

Excited to have this! :)

thisac

Great work @albi3ro and @josh146! Only found a missing punctuation, but other than that looks good. 💯

pennylane/interfaces/autograd.py

Co-authored-by: Theodor <theodor@xanadu.ai>

albi3ro

Found something to fix before getting merged in. Could definitely cause bugs down the line.

pennylane/interfaces/autograd.py

albi3ro

Updating from "request changes" to "approve" since the change isn't actually needed.

josh146 added 11 commits March 9, 2021 18:25

Add support for the parameter-shift hessian to the Torch interface

cf8e879

update changelog

0f9dec6

fixes

26f509a

fixes

de830d1

Merge branch 'master' into hessian_torch

5be38ce

fix

2eae4e8

Merge branch 'hessian_torch' of github.com:PennyLaneAI/pennylane into…

7782493

… hessian_torch

line break

01c2145

Add support for the parameter-shift hessian to the Autograd interface

4b0ac4f

remove print statement

9fc0113

fix

c9efc77

josh146 requested review from chaserileyroberts, albi3ro and thisac March 9, 2021 17:09

chaserileyroberts reviewed Mar 9, 2021

View reviewed changes

suggested changes

fc39d41

josh146 requested a review from chaserileyroberts March 10, 2021 08:10

thisac reviewed Mar 10, 2021

View reviewed changes

Update pennylane/tape/interfaces/autograd.py

6300563

Co-authored-by: Theodor <theodor@xanadu.ai>

josh146 linked an issue Mar 18, 2021 that may be closed by this pull request

Too many circuit executions when computing parameter-shift Jacobians using Autograd #1133

Closed

josh146 mentioned this pull request Mar 19, 2021

Add support for the parameter-shift hessian to the Torch interface #1129

Merged

josh146 added 3 commits March 22, 2021 16:20

merge master

c6efa64

merge master

1bde8af

merge master

28a4ffe

albi3ro reviewed Mar 22, 2021

View reviewed changes

pennylane/interfaces/autograd.py Show resolved Hide resolved

albi3ro reviewed Mar 22, 2021

View reviewed changes

pennylane/interfaces/autograd.py Show resolved Hide resolved

josh146 added 4 commits March 23, 2021 15:23

Update pennylane/interfaces/autograd.py

54401aa

added new tests

7b972ce

Merge branch 'hessian_autograd' of github.com:PennyLaneAI/pennylane i…

7a51081

…nto hessian_autograd

black

9af85b0

albi3ro reviewed Mar 23, 2021

View reviewed changes

pennylane/interfaces/autograd.py Outdated Show resolved Hide resolved

albi3ro reviewed Mar 23, 2021

View reviewed changes

pennylane/interfaces/autograd.py Show resolved Hide resolved

albi3ro reviewed Mar 23, 2021

View reviewed changes

albi3ro requested changes Mar 23, 2021

View reviewed changes

suggested changes

4c825cd

josh146 requested a review from albi3ro March 23, 2021 16:12

thisac reviewed Mar 23, 2021

View reviewed changes

pennylane/interfaces/autograd.py Outdated Show resolved Hide resolved

josh146 and others added 3 commits March 24, 2021 13:20

Update pennylane/interfaces/autograd.py

7bca112

Co-authored-by: Theodor <theodor@xanadu.ai>

Merge branch 'master' into hessian_torch

71e97b8

Merge branch 'hessian_torch' into hessian_autograd

e3ce723

albi3ro approved these changes Mar 24, 2021

View reviewed changes

thisac approved these changes Mar 24, 2021

View reviewed changes

pennylane/interfaces/autograd.py Outdated Show resolved Hide resolved

Update pennylane/interfaces/autograd.py

fedbdeb

Co-authored-by: Theodor <theodor@xanadu.ai>

albi3ro requested changes Mar 25, 2021

View reviewed changes

pennylane/interfaces/autograd.py Show resolved Hide resolved

albi3ro approved these changes Mar 25, 2021

View reviewed changes

Base automatically changed from hessian_torch to master March 26, 2021 16:39

Merge branch 'master' into hessian_autograd

8745c6f

josh146 merged commit 0021c3f into master Mar 26, 2021

josh146 deleted the hessian_autograd branch March 26, 2021 17:04

josh146 mentioned this pull request Aug 18, 2021

Add caching to the autograd batch interface #1508

Merged

josh146 mentioned this pull request May 19, 2022

[BUG] With many observables generate_shifted_tapes() is called "unreasonably often" resulting in massive performance loss #2430

Closed

1 task

mlxd mentioned this pull request May 31, 2022

Cache jacobian for reuse with param-shift #2645

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the parameter-shift hessian to the Autograd interface #1131

Add support for the parameter-shift hessian to the Autograd interface #1131

josh146 commented Mar 9, 2021

codecov bot commented Mar 9, 2021 •

edited

Loading

chaserileyroberts left a comment

chaserileyroberts Mar 9, 2021

josh146 Mar 10, 2021 •

edited

Loading

josh146 Mar 10, 2021 •

edited

Loading

chaserileyroberts Mar 9, 2021

josh146 Mar 10, 2021

albi3ro Mar 23, 2021

chaserileyroberts Mar 9, 2021

josh146 Mar 10, 2021

josh146 Mar 10, 2021 •

edited

Loading

chaserileyroberts Mar 22, 2021

josh146 commented Mar 10, 2021 •

edited

Loading

thisac left a comment

thisac Mar 10, 2021

josh146 Mar 17, 2021

thisac Mar 10, 2021

josh146 Mar 11, 2021

josh146 Mar 17, 2021

thisac Mar 10, 2021

josh146 commented Mar 17, 2021 •

edited

Loading

albi3ro Mar 23, 2021

josh146 Mar 23, 2021 •

edited

Loading

albi3ro Mar 23, 2021

josh146 Mar 23, 2021 •

edited

Loading

albi3ro left a comment

josh146 commented Mar 23, 2021

albi3ro left a comment

thisac left a comment

albi3ro left a comment

albi3ro left a comment

		def test_grad_matrix_caching(self, mocker, dev_name, diff_method):
		"""Test the gradient matrix caching"""


		dev = qml.device(dev_name, wires=1)

		@qnode(dev, diff_method=diff_method, interface="autograd")

Add support for the parameter-shift hessian to the Autograd interface #1131

Add support for the parameter-shift hessian to the Autograd interface #1131

Conversation

josh146 commented Mar 9, 2021

codecov bot commented Mar 9, 2021 • edited Loading

Codecov Report

chaserileyroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

josh146 Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 Mar 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 commented Mar 10, 2021 • edited Loading

thisac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 commented Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

josh146 Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

albi3ro left a comment

Choose a reason for hiding this comment

josh146 commented Mar 23, 2021

albi3ro left a comment

Choose a reason for hiding this comment

thisac left a comment

Choose a reason for hiding this comment

albi3ro left a comment

Choose a reason for hiding this comment

albi3ro left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 9, 2021 •

edited

Loading

josh146 Mar 10, 2021 •

edited

Loading

josh146 Mar 10, 2021 •

edited

Loading

josh146 Mar 10, 2021 •

edited

Loading

josh146 commented Mar 10, 2021 •

edited

Loading

josh146 commented Mar 17, 2021 •

edited

Loading

josh146 Mar 23, 2021 •

edited

Loading

josh146 Mar 23, 2021 •

edited

Loading